fix: retry event watcher blocks after RPC failures#349
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes ContractEventWatcher.sync_to() so it only advances the cursor past blocks whose event retrieval succeeded, preventing silent event loss when transient RPC failures occur (Fixes #201).
Changes:
- Make
process_block()return a success flag and stop the sync loop on retrieval failures. - Advance
cursorto the last successfully processed block instead of unconditionally to the end of the window. - Add regression tests covering clean sync, mid-window failure, and retry after recovery.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
allways/validator/event_watcher.py |
Track last successfully processed block and halt cursor advancement when block/event retrieval fails. |
tests/test_event_watcher.py |
Add tests to ensure cursor stops before a failed block and retries successfully on a subsequent sync. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| block_hash = self.substrate.get_block_hash(block_num) | ||
| if not block_hash: | ||
| return | ||
| return True | ||
| events = self.substrate.get_events(block_hash=block_hash) | ||
| except Exception as e: | ||
| bt.logging.debug(f'EventWatcher: block {block_num} events unavailable: {e}') | ||
| return | ||
| return False |
| for block_num in range(self.cursor + 1, end + 1): | ||
| self.process_block(block_num) | ||
| self.cursor = end | ||
| if not self.process_block(block_num): | ||
| break | ||
| last_processed = block_num | ||
| self.cursor = last_processed |
| def test_transient_block_fetch_failure_stops_cursor_before_failed_block(self, tmp_path: Path): | ||
| w = make_watcher(tmp_path) | ||
| w.cursor = 10 | ||
|
|
||
| def get_block_hash(block_num: int): | ||
| if block_num == 12: | ||
| raise RuntimeError('rpc timeout') | ||
| return f'hash-{block_num}' | ||
|
|
||
| w.substrate.get_block_hash.side_effect = get_block_hash | ||
| w.substrate.get_events.return_value = [] | ||
|
|
|
Updated this branch to address the actionable Copilot review items:
I did not add pruned/missing-block special casing because there does not appear to be a clean existing error taxonomy for that path; string matching provider errors would be brittle. Verification:
|
ea4391f to
3039b8d
Compare
|
Rebased this branch onto the latest Verification after rebase:
|
3039b8d to
48e663b
Compare
Summary
process_block()report whether block event retrieval succeededFixes #201
Note: #339 also touches
event_watcher.pyfor state persistence, so this may need a small rebase if that lands first.Tests
uv run pytest tests/test_event_watcher.py -quv run ruff check allways/validator/event_watcher.py tests/test_event_watcher.pyuv run ruff format --check allways/validator/event_watcher.py tests/test_event_watcher.pygit diff --check